Class Conditional Nearest Neighbor and Large Margin Instance Selection
نویسنده
چکیده
The one nearest neighbor (1-NN) rule uses instance proximity followed by class labeling information for classifying new instances. This paper presents a framework for studying properties of the training set related to proximity and labeling information, in order to improve the performance of the 1-NN rule. To this aim, a so-called class conditional nearest neighbor (c.c.n.n.) relation is introduced, consisting of those pairs of training instances (a, b) such that b is the nearest neighbor of a among those instances (excluded a) in one of the classes of the training set. A graph-based representation of c.c.n.n. is used for a comparative analysis of c.c.n.n. and of other interesting proximity-based concepts. In particular, a scoring function on instances is introduced, which measures the effect of removing one instance on the hypothesis-margin of other instances. This scoring function is employed to develop an effective large margin instance selection algorithm, which is empirically demonstrated to improve storage and accuracy performance of the 1-NN rule on artificial and real-life data sets.
منابع مشابه
Discriminative Learning of the Prototype Set for Nearest Neighbor Classification
The nearest neighbor rule is one of the most widely used models for classification and selecting a compact set of prototype instances is an important problem for its applications. Many existing approaches on the prototype selection problem rely on instance-based analyses and local criteria on the class distribution, which are intractable for numerical optimization techniques. In this paper, we ...
متن کاملLiquid-liquid equilibrium data prediction using large margin nearest neighbor
Guanidine hydrochloride has been widely used in the initial recovery steps of active protein from the inclusion bodies in aqueous two-phase system (ATPS). The knowledge of the guanidine hydrochloride effects on the liquid-liquid equilibrium (LLE) phase diagram behavior is still inadequate and no comprehensive theory exists for the prediction of the experimental trends. Therefore the effect the ...
متن کاملLarge Margin Subspace Learning for feature selection
Recent research has shown the benefits of large margin framework for feature selection. In this paper, we propose a novel feature selection algorithm, termed as Large Margin Subspace Learning (LMSL), which seeks a projection matrix to maximize the margin of a given sample, defined as the distance between the nearest missing (the nearest neighbor with the different label) and the nearest hit (th...
متن کاملTime Series Classification by Class-Based Mahalanobis Distances
To classify time series by nearest neighbor, we need to specify or learn a distance. We consider several variations of the Mahalanobis distance and the related Large Margin Nearest Neighbor Classification (LMNN). We find that the conventional Mahalanobis distance is counterproductive. However, both LMNN and the class-based diagonal Mahalanobis distance are competitive.
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008